Skip to content

[pull] master from torvalds:master#73

Merged
pull[bot] merged 282 commits intolinux-doc:masterfrom
torvalds:master
Oct 12, 2020
Merged

[pull] master from torvalds:master#73
pull[bot] merged 282 commits intolinux-doc:masterfrom
torvalds:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Oct 12, 2020

See Commits and Changes for more details.


Created by pull[bot]. Want to support this open source service? Please star it : )

Fenghua Yu and others added 30 commits August 26, 2020 17:53
Early Intel hardware implementations of Memory Bandwidth Allocation (MBA)
could only control bandwidth at the processor core level. This meant that
when two processes with different bandwidth allocations ran simultaneously
on the same core the hardware had to resolve this difference. It did so by
applying the higher throttling value (lower bandwidth) to both processes.

Newer implementations can apply different throttling values to each
thread on a core.

Introduce a new resctrl file, "thread_throttle_mode", on Intel systems
that shows to the user how throttling values are allocated, per-core or
per-thread.

On systems that support per-core throttling, the file will display "max".
On newer systems that support per-thread throttling, the file will display
"per-thread".

AMD confirmed in [1] that AMD bandwidth allocation is already at thread
level but that the AMD implementation does not use a memory delay
throttle mode. So to avoid confusion the thread throttling mode would be
UNDEFINED on AMD systems and the "thread_throttle_mode" file will not be
visible.

Originally-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/1598296281-127595-3-git-send-email-fenghua.yu@intel.com
Link: [1] https://lore.kernel.org/lkml/18d277fd-6523-319c-d560-66b63ff606b8@amd.com
A long time ago, Linux cleared IA32_MCG_STATUS at the very end of machine
check processing.

Then, some fancy recovery and IST manipulation was added in:

  d4812e1 ("x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks")

and clearing IA32_MCG_STATUS was pulled earlier in the function.

Next change moved the actual recovery out of do_machine_check() and
just used task_work_add() to schedule it later (before returning to the
user):

  5567d11 ("x86/mce: Send #MC singal from task work")

Most recently the fancy IST footwork was removed as no longer needed:

  b052df3 ("x86/entry: Get rid of ist_begin/end_non_atomic()")

At this point there is no reason remaining to clear IA32_MCG_STATUS early.
It can move back to the very end of the function.

Also move sync_core(). The comments for this function say that it should
only be called when instructions have been changed/re-mapped. Recovery
for an instruction fetch may change the physical address. But that
doesn't happen until the scheduled work runs (which could be on another
CPU).

 [ bp: Massage commit message. ]

Reported-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200824221237.5397-1-tony.luck@intel.com
The ptrace() test forgot to reap its child.  Reap it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/e7700a503f30e79ab35a63103938a19893dbeff2.1598461151.git.luto@kernel.org
…LDT GS

This tests commit:

  8ab4952 ("x86/fsgsbase/64: Fix NULL deref in 86_fsgsbase_read_task")

Unpatched kernels will OOPS.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/c618ae86d1f757e01b1a8e79869f553cb88acf9a.1598461151.git.luto@kernel.org
Remove asm/io_apic.h which is included more than once.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Link: https://lkml.kernel.org/r/20200819112910.7629-1-wanghai38@huawei.com
…ions

Intel TSX suspend load tracking instructions aim to give a way to choose
which memory accesses do not need to be tracked in the TSX read set. Add
TSX suspend load tracking CPUID feature flag TSXLDTRK for enumeration.

A processor supports Intel TSX suspend load address tracking if
CPUID.0x07.0x0:EDX[16] is present. Two instructions XSUSLDTRK, XRESLDTRK
are available when this feature is present.

The CPU feature flag is shown as "tsxldtrk" in /proc/cpuinfo.

Signed-off-by: Kyung Min Park <kyung.min.park@intel.com>
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1598316478-23337-2-git-send-email-cathy.zhang@intel.com
TSX suspend load tracking instruction is supported by the Intel uarch
Sapphire Rapids. It aims to give a way to choose which memory accesses
do not need to be tracked in the TSX read set. It's availability is
indicated as CPUID.(EAX=7,ECX=0):EDX[bit 16].

Expose TSX Suspend Load Address Tracking feature in KVM CPUID, so KVM
could pass this information to guests and they can make use of this
feature accordingly.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1598316478-23337-3-git-send-email-cathy.zhang@intel.com
/proc/cpuinfo shows features which the kernel supports. Some of these
flags are derived from CPUID, and others are derived from other sources,
including some that are entirely software-based. Currently, there is
not any documentation in the kernel about how /proc/cpuinfo flags are
generated and what it means when they are missing.

Add documentation for /proc/cpuinfo feature flags enumeration.
Document how and when x86 feature flags are used. Also discuss what
their presence or absence mean for the kernel and users.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Co-developed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Kyung Min Park <kyung.min.park@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200831183500.15481-1-kyung.min.park@intel.com
Fix build warning since this file is already listed in
include/asm-generic/Kbuild.

../scripts/Makefile.asm-generic:25: redundant generic-y found in arch/microblaze/include/asm/Kbuild: hw_irq.h

Fixes: 630f289 ("asm-generic: make more kernel-space headers mandatory")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/4d992aee-8a69-1769-e622-8d6d6e316346@infradead.org
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
When pci_get_device_func() fails, the driver doesn't need to execute
pci_dev_put(). mci should still be freed, though, to prevent a memory
leak. When pci_enable_device() fails, the error injection PCI device
"einj" doesn't need to be disabled either.

 [ bp: Massage commit message, rename label to "bail_mc_free". ]

Fixes: 52608ba ("i5100_edac: probe for device 19 function 0")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200826121437.31606-1-dinghao.liu@zju.edu.cn
Most of the compat vDSO code can be built and guarded using IS_ENABLED,
so drop the unnecessary #ifdefs.

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
There's really no need to put every parameter on a new line when calling
a function with a long name, so reformat the *setup_additional_pages()
functions in the vDSO setup code to follow the usual conventions.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
platform_get_irq() returns a negative error number on error. In such a
case, comparison to 0 would pass the check therefore check the return
value properly, whether it is negative.

 [ bp: Massage commit message. ]

Fixes: 9b7e624 ("EDAC, aspeed: Add an Aspeed AST2500 EDAC driver")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Stefan Schaeckeler <schaecsn@gmx.net>
Link: https://lkml.kernel.org/r/20200827070743.26628-1-krzk@kernel.org
platform_get_irq() returns a negative error number on error. In such a
case, comparison to 0 would pass the check therefore check the return
value properly, whether it is negative.

 [ bp: Massage commit message. ]

Fixes: 86a18ee ("EDAC, ti: Add support for TI keystone and DRA7xx EDAC")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tero Kristo <t-kristo@ti.com>
Link: https://lkml.kernel.org/r/20200827070743.26628-2-krzk@kernel.org
Add Memory Tagging Extension system register definitions together with
the relevant bitfields.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Once user space is given access to tagged memory, the kernel must be
able to clear/save/restore tags visible to the user. This is done via
the linear mapping, therefore map it as such. The new MT_NORMAL_TAGGED
index for MAIR_EL1 is initially mapped as Normal memory and later
changed to Normal Tagged via the cpufeature infrastructure. From a
mismatched attribute aliases perspective, the Tagged memory is
considered a permission and it won't lead to undefined behaviour.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
Add the cpufeature and hwcap entries to detect the presence of MTE. Any
secondary CPU not supporting the feature, if detected on the boot CPU,
will be parked.

Add the minimum SCTLR_EL1 and HCR_EL2 bits for enabling MTE. The Normal
Tagged memory type is configured in MAIR_EL1 before the MMU is enabled
in order to avoid disrupting other CPUs in the CnP domain.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
inst.h was included in calling.h solely to instantiate the RDPID macro.
The usage of RDPID was removed in

  6a3ea3e ("x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM")

so remove the include.

Fixes: 6a3ea3e ("x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM")
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200827171735.93825-1-ubizjak@gmail.com
KVM does not support MTE in guests yet, so clear the corresponding field
in the ID_AA64PFR1_EL1 register. In addition, inject an undefined
exception in the guest if it accesses one of the GCR_EL1, RGSR_EL1,
TFSR_EL1 or TFSRE0_EL1 registers. While the emulate_sys_reg() function
already injects an undefined exception, this patch prevents the
unnecessary printk.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steven Price <steven.price@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Add MTE-specific SIGSEGV codes to siginfo.h and update the x86
BUILD_BUG_ON(NSIGSEGV != 7) compile check.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
[catalin.marinas@arm.com: renamed precise/imprecise to sync/async]
[catalin.marinas@arm.com: dropped #ifdef __aarch64__, renumbered]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will@kernel.org>
The Memory Tagging Extension has two modes of notifying a tag check
fault at EL0, configurable through the SCTLR_EL1.TCF0 field:

1. Synchronous raising of a Data Abort exception with DFSC 17.
2. Asynchronous setting of a cumulative bit in TFSRE0_EL1.

Add the exception handler for the synchronous exception and handling of
the asynchronous TFSRE0_EL1.TF0 bit setting via a new TIF flag in
do_notify_resume().

On a tag check failure in user-space, whether synchronous or
asynchronous, a SIGSEGV will be raised on the faulting thread.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
For arm64 MTE support it is necessary to be able to mark pages that
contain user space visible tags that will need to be saved/restored e.g.
when swapped out.

To support this add a new arch specific flag (PG_arch_2). This flag is
only available on 64-bit architectures due to the limited number of
spare page flags on the 32-bit ones.

Signed-off-by: Steven Price <steven.price@arm.com>
[catalin.marinas@arm.com: use CONFIG_64BIT for guarding this new flag]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
When a huge page is split into normal pages, part of the head page flags
are transferred to the tail pages. However, the PG_arch_* flags are not
part of the preserved set.

PG_arch_2 is used by the arm64 MTE support to mark pages that have valid
tags. The absence of such flag would cause the arm64 set_pte_at() to
clear the tags in order to avoid stale tags exposed to user or the
swapping out hooks to ignore the tags. Not preserving PG_arch_2 on huge
page splitting leads to tag corruption in the tail pages.

Preserve the newly added PG_arch_2 flag in __split_huge_page_tail().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
…ROT_MTE

Pages allocated by the kernel are not guaranteed to have the tags
zeroed, especially as the kernel does not (yet) use MTE itself. To
ensure the user can still access such pages when mapped into its address
space, clear the tags via set_pte_at(). A new page flag - PG_mte_tagged
(PG_arch_2) - is used to track pages with valid allocation tags.

Since the zero page is mapped as pte_special(), it won't be covered by
the above set_pte_at() mechanism. Clear its tags during early MTE
initialisation.

Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
When the Memory Tagging Extension is enabled, the tags need to be
preserved across page copy (e.g. for copy-on-write, page migration).

Introduce MTE-aware copy_{user_,}highpage() functions to copy tags to
the destination if the source page has the PG_mte_tagged flag set.
copy_user_page() does not need to handle tag copying since, with this
patch, it is only called by the DAX code where there is no source page
structure (and no source tags).

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Since clear_user_page() calls clear_page() directly, avoid the
unnecessary indirection.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
When the Memory Tagging Extension is enabled, two pages are identical
only if both their data and tags are identical.

Make the generic memcmp_pages() a __weak function and add an
arm64-specific implementation which returns non-zero if any of the two
pages contain valid MTE tags (PG_mte_tagged set). There isn't much
benefit in comparing the tags of two pages since these are normally used
for heap allocations and likely to differ anyway.

Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Similarly to arch_calc_vm_prot_bits(), introduce a dummy
arch_calc_vm_flag_bits() invoked from calc_vm_flag_bits(). This macro
can be overridden by architectures to insert specific VM_* flags derived
from the mmap() MAP_* flags.

Signed-off-by: Kevin Brodsky <Kevin.Brodsky@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
To enable tagging on a memory range, the user must explicitly opt in via
a new PROT_MTE flag passed to mmap() or mprotect(). Since this is a new
memory type in the AttrIndx field of a pte, simplify the or'ing of these
bits over the protection_map[] attributes by making MT_NORMAL index 0.

There are two conditions for arch_vm_get_page_prot() to return the
MT_NORMAL_TAGGED memory type: (1) the user requested it via PROT_MTE,
registered as VM_MTE in the vm_flags, and (2) the vma supports MTE,
decided during the mmap() call (only) and registered as VM_MTE_ALLOWED.

arch_calc_vm_prot_bits() is responsible for registering the user request
as VM_MTE. The newly introduced arch_calc_vm_flag_bits() sets
VM_MTE_ALLOWED if the mapping is MAP_ANONYMOUS. An MTE-capable
filesystem (RAM-based) may be able to set VM_MTE_ALLOWED during its
mmap() file ops call.

In addition, update VM_DATA_DEFAULT_FLAGS to allow mprotect(PROT_MTE) on
stack or brk area.

The Linux mmap() syscall currently ignores unknown PROT_* flags. In the
presence of MTE, an mmap(PROT_MTE) on a file which does not support MTE
will not report an error and the memory will not be mapped as Normal
Tagged. For consistency, mprotect(PROT_MTE) will not report an error
either if the memory range does not support MTE. Two subsequent patches
in the series will propose tightening of this behaviour.

Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Similarly to arch_validate_prot() called from do_mprotect_pkey(), an
architecture may need to sanity-check the new vm_flags.

Define a dummy function always returning true. In addition to
do_mprotect_pkey(), also invoke it from mmap_region() prior to updating
vma->vm_page_prot to allow the architecture code to veto potentially
inconsistent vm_flags.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Andre-ARM and others added 23 commits October 7, 2020 14:17
Commit 9bceb80 ("arm64: kaslr: Use standard early random
function") removed the direct calls of the __arm64_rndr() and
__early_cpu_has_rndr() functions, but left the dummy prototypes in the
 #else branch of the #ifdef CONFIG_ARCH_RANDOM guard.

Remove the redundant prototypes, as they have no users outside of
this header file.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20201006194453.36519-1-andre.przywara@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Late patches for 5.10: MTE selftests, minor KCSAN preparation and removal
of some unused prototypes.

(Amit Daniel Kachhap and others)
* for-next/late-arrivals:
  arm64: random: Remove no longer needed prototypes
  arm64: initialize per-cpu offsets earlier
  kselftest/arm64: Check mte tagged user address in kernel
  kselftest/arm64: Verify KSM page merge for MTE pages
  kselftest/arm64: Verify all different mmap MTE options
  kselftest/arm64: Check forked child mte memory accessibility
  kselftest/arm64: Verify mte tag inclusion via prctl
  kselftest/arm64: Add utilities and a test to validate mte memory
Carve out the MOVDIR64B inline asm primitive into a generic helper so
that it can be used by other functions. Move it to special_insns.h and
have iosubmit_cmds512() call it.

 [ bp: Massage commit message. ]

Suggested-by: Michael Matz <matz@suse.de>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201005151126.657029-2-dave.jiang@intel.com
Currently, the MOVDIR64B instruction is used to atomically submit
64-byte work descriptors to devices. Although it can encounter errors
like device queue full, command not accepted, device not ready, etc when
writing to a device MMIO, MOVDIR64B can not report back on errors from
the device itself. This means that MOVDIR64B users need to separately
interact with a device to see if a descriptor was successfully queued,
which slows down device interactions.

ENQCMD and ENQCMDS also atomically submit 64-byte work descriptors
to devices. But, they *can* report back errors directly from the
device, such as if the device was busy, or device not enabled or does
not support the command. This immediate feedback from the submission
instruction itself reduces the number of interactions with the device
and can greatly increase efficiency.

ENQCMD can be used at any privilege level, but can effectively only
submit work on behalf of the current process. ENQCMDS is a ring0-only
instruction and can explicitly specify a process context instead of
being tied to the current process or needing to reprogram the IA32_PASID
MSR.

Use ENQCMDS for work submission within the kernel because a Process
Address ID (PASID) is setup to translate the kernel virtual address
space. This PASID is provided to ENQCMDS from the descriptor structure
submitted to the device and not retrieved from IA32_PASID MSR, which is
setup for the current user address space.

See Intel Software Developer’s Manual for more information on the
instructions.

 [ bp:
   - Make operand constraints like movdir64b() because both insns are
     basically doing the same thing, more or less.
   - Fixup comments and cleanup. ]

Link: https://lkml.kernel.org/r/20200924180041.34056-3-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20201005151126.657029-3-dave.jiang@intel.com
Add asm/mce.h to asm/asm-prototypes.h so that that asm symbol's checksum
can be generated in order to support CONFIG_MODVERSIONS with it and fix:

  WARNING: modpost: EXPORT symbol "copy_mc_fragile" [vmlinux] version \
	  generation failed, symbol will not be versioned.

For reference see:

  4efca4e ("kbuild: modversions for EXPORT_SYMBOL() for asm")
  334bb77 ("x86/kbuild: enable modversions for symbols exported from asm")

Fixes: ec6347b ("x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()")
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201007111447.GA23257@zn.tnic
This reverts commit 353e228.

Qian Cai reports that TX2 no longer boots with his .config as it appears
that task_cpu() gets instrumented and used before KASAN has been
initialised.

Although Mark has a proposed fix, let's take the safe option of reverting
this for now and sorting it out properly later.

Link: https://lore.kernel.org/r/711bc57a314d8d646b41307008db2845b7537b3d.camel@redhat.com
Reported-by: Qian Cai <cai@redhat.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
AMD Family 19h Models 20h-2Fh use the same PCI IDs as Family 17h Models
70h-7Fh. The same family ops and number of channels also apply.

Use the Family17h Model 70h family_type and ops for Family 19h Models
20h-2Fh. Update the controller name to match the system.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201009171803.3214354-1-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
…rnel/git/jarkko/linux-tpmdd

Pull tpm updates from Jarkko Sakkinen:
 "Support for a new TPM device and fixes and Git URL change (infraded ->
  korg)"

* tag 'tpmdd-next-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  MAINTAINERS: TPM DEVICE DRIVER: Update GIT
  tpm_tis: Add a check for invalid status
  tpm: use %*ph to print small buffer
  dt-bindings: Add SynQucer TPM MMIO as a trivial device
  tpm: tis: add support for MMIO TPM on SynQuacer
…el/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "There's quite a lot of code here, but much of it is due to the
  addition of a new PMU driver as well as some arm64-specific selftests
  which is an area where we've traditionally been lagging a bit.

  In terms of exciting features, this includes support for the Memory
  Tagging Extension which narrowly missed 5.9, hopefully allowing
  userspace to run with use-after-free detection in production on CPUs
  that support it. Work is ongoing to integrate the feature with KASAN
  for 5.11.

  Another change that I'm excited about (assuming they get the hardware
  right) is preparing the ASID allocator for sharing the CPU page-table
  with the SMMU. Those changes will also come in via Joerg with the
  IOMMU pull.

  We do stray outside of our usual directories in a few places, mostly
  due to core changes required by MTE. Although much of this has been
  Acked, there were a couple of places where we unfortunately didn't get
  any review feedback.

  Other than that, we ran into a handful of minor conflicts in -next,
  but nothing that should post any issues.

  Summary:

   - Userspace support for the Memory Tagging Extension introduced by
     Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11.

   - Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
     switching.

   - Fix and subsequent rewrite of our Spectre mitigations, including
     the addition of support for PR_SPEC_DISABLE_NOEXEC.

   - Support for the Armv8.3 Pointer Authentication enhancements.

   - Support for ASID pinning, which is required when sharing
     page-tables with the SMMU.

   - MM updates, including treating flush_tlb_fix_spurious_fault() as a
     no-op.

   - Perf/PMU driver updates, including addition of the ARM CMN PMU
     driver and also support to handle CPU PMU IRQs as NMIs.

   - Allow prefetchable PCI BARs to be exposed to userspace using normal
     non-cacheable mappings.

   - Implementation of ARCH_STACKWALK for unwinding.

   - Improve reporting of unexpected kernel traps due to BPF JIT
     failure.

   - Improve robustness of user-visible HWCAP strings and their
     corresponding numerical constants.

   - Removal of TEXT_OFFSET.

   - Removal of some unused functions, parameters and prototypes.

   - Removal of MPIDR-based topology detection in favour of firmware
     description.

   - Cleanups to handling of SVE and FPSIMD register state in
     preparation for potential future optimisation of handling across
     syscalls.

   - Cleanups to the SDEI driver in preparation for support in KVM.

   - Miscellaneous cleanups and refactoring work"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
  Revert "arm64: initialize per-cpu offsets earlier"
  arm64: random: Remove no longer needed prototypes
  arm64: initialize per-cpu offsets earlier
  kselftest/arm64: Check mte tagged user address in kernel
  kselftest/arm64: Verify KSM page merge for MTE pages
  kselftest/arm64: Verify all different mmap MTE options
  kselftest/arm64: Check forked child mte memory accessibility
  kselftest/arm64: Verify mte tag inclusion via prctl
  kselftest/arm64: Add utilities and a test to validate mte memory
  perf: arm-cmn: Fix conversion specifiers for node type
  perf: arm-cmn: Fix unsigned comparison to less than zero
  arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD
  arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op
  arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
  arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
  KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
  arm64: Get rid of arm64_ssbd_state
  KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
  KVM: arm64: Get rid of kvm_arm_have_ssbd()
  KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
  ...
Pull Microblaze build warning fix from Michal Simek.

* tag 'microblaze-v5.10' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: fix kbuild redundant file warning
…/kernel/git/geert/linux-m68k

Pull m68k updates from Geert Uytterhoeven:

  - Conversion of the Mac IDE driver to a platform driver

  - Minor cleanups and fixes

* tag 'm68k-for-v5.10-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  ide/macide: Convert Mac IDE driver to platform driver
  m68k: Replace HTTP links with HTTPS ones
  m68k: mm: Remove superfluous memblock_alloc*() casts
  m68k: mm: Use PAGE_ALIGNED() helper
  m68k: Sort selects in main Kconfig
  m68k: amiga: Clean up Amiga hardware configuration
  m68k: Revive _TIF_* masks
  m68k: Correct some typos in comments
  m68k: Use get_kernel_nofault() in show_registers()
  zorro: Fix address space collision message with RAM expansion boards
  m68k: amiga: Fix Denise detection on OCS
…nux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - Add Amazon's Annapurna Labs memory controller EDAC driver (Talel
   Shenhar)

 - New AMD CPUs support (Yazen Ghannam)

 - The usual misc fixes and cleanups all over the subsystem

* tag 'edac_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/amd64: Set proper family type for Family 19h Models 20h-2Fh
  EDAC/mc_sysfs: Add missing newlines when printing {max,dimm}_location
  EDAC/aspeed: Use module_platform_driver() to simplify
  EDAC, sb_edac: Simplify switch statement
  EDAC/ti: Fix handling of platform_get_irq() error
  EDAC/aspeed: Fix handling of platform_get_irq() error
  EDAC/i5100: Fix error handling order in i5100_init_one()
  EDAC/highbank: Handover Calxeda Highbank maintenance to Andre Przywara
  EDAC/socfpga: Transfer SoCFPGA EDAC maintainership
  EDAC/thunderx: Make symbol lmc_dfs_ents static
  EDAC/al-mc-edac: Add Amazon's Annapurna Labs Memory Controller driver
  dt-bindings: EDAC: Add Amazon's Annapurna Labs Memory Controller binding
  EDAC/mce_amd: Add new error descriptions for existing types
  EDAC: Replace HTTP links with HTTPS ones
…ux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

 - Extend the recovery from MCE in kernel space also to processes which
   encounter an MCE in kernel space but while copying from user memory
   by sending them a SIGBUS on return to user space and umapping the
   faulty memory, by Tony Luck and Youquan Song.

 - memcpy_mcsafe() rework by splitting the functionality into
   copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables
   support for new hardware which can recover from a machine check
   encountered during a fast string copy and makes that the default and
   lets the older hardware which does not support that advance recovery,
   opt in to use the old, fragile, slow variant, by Dan Williams.

 - New AMD hw enablement, by Yazen Ghannam and Akshay Gupta.

 - Do not use MSR-tracing accessors in #MC context and flag any fault
   while accessing MCA architectural MSRs as an architectural violation
   with the hope that such hw/fw misdesigns are caught early during the
   hw eval phase and they don't make it into production.

 - Misc fixes, improvements and cleanups, as always.

* tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Allow for copy_mc_fragile symbol checksum to be generated
  x86/mce: Decode a kernel instruction to determine if it is copying from user
  x86/mce: Recover from poison found while copying from user space
  x86/mce: Avoid tail copy when machine check terminated a copy from user
  x86/mce: Add _ASM_EXTABLE_CPY for copy user access
  x86/mce: Provide method to find out the type of an exception handler
  x86/mce: Pass pointer to saved pt_regs to severity calculation routines
  x86/copy_mc: Introduce copy_mc_enhanced_fast_string()
  x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
  x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule list
  x86/mce: Add Skylake quirk for patrol scrub reported errors
  RAS/CEC: Convert to DEFINE_SHOW_ATTRIBUTE()
  x86/mce: Annotate mce_rd/wrmsrl() with noinstr
  x86/mce/dev-mcelog: Do not update kflags on AMD systems
  x86/mce: Stop mce_reign() from re-computing severity for every CPU
  x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR
  x86/mce: Increase maximum number of banks to 64
  x86/mce: Delay clearing IA32_MCG_STATUS to the end of do_machine_check()
  x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmap
  RAS/CEC: Fix cec_init() prototype
…ernel/git/tip/tip

Pull x86 cpu updates from Borislav Petkov:

 - Add support for hardware-enforced cache coherency on AMD which
   obviates the need to flush cachelines before changing the PTE
   encryption bit (Krish Sadhukhan)

 - Add Centaur initialization support for families >= 7 (Tony W Wang-oc)

 - Add a feature flag for, and expose TSX suspend load tracking feature
   to KVM (Cathy Zhang)

 - Emulate SLDT and STR so that windows programs don't crash on UMIP
   machines (Brendan Shanks and Ricardo Neri)

 - Use the new SERIALIZE insn on Intel hardware which supports it
   (Ricardo Neri)

 - Misc cleanups and fixes

* tag 'x86_cpu_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  KVM: SVM: Don't flush cache if hardware enforces cache coherency across encryption domains
  x86/mm/pat: Don't flush cache if hardware enforces cache coherency across encryption domnains
  x86/cpu: Add hardware-enforced cache coherency as a CPUID feature
  x86/cpu/centaur: Add Centaur family >=7 CPUs initialization support
  x86/cpu/centaur: Replace two-condition switch-case with an if statement
  x86/kvm: Expose TSX Suspend Load Tracking feature
  x86/cpufeatures: Enumerate TSX suspend load address tracking instructions
  x86/umip: Add emulation/spoofing for SLDT and STR instructions
  x86/cpu: Fix typos and improve the comments in sync_core()
  x86/cpu: Use XGETBV and XSETBV mnemonics in fpu/internal.h
  x86/cpu: Use SERIALIZE in sync_core() when available
…nux/kernel/git/tip/tip

Pull x86 platform updates from Borislav Petkov:

 - Cleanup different aspects of the UV code and start adding support for
   the new UV5 class of systems (Mike Travis)

 - Use a flexible array for a dynamically sized struct uv_rtc_timer_head
   (Gustavo A. R. Silva)

* tag 'x86_platform_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/uv: Update Copyrights to conform to HPE standards
  x86/platform/uv: Update for UV5 NMI MMR changes
  x86/platform/uv: Update UV5 TSC checking
  x86/platform/uv: Update node present counting
  x86/platform/uv: Update UV5 MMR references in UV GRU
  x86/platform/uv: Adjust GAM MMR references affected by UV5 updates
  x86/platform/uv: Update MMIOH references based on new UV5 MMRs
  x86/platform/uv: Add and decode Arch Type in UVsystab
  x86/platform/uv: Add UV5 direct references
  x86/platform/uv: Update UV MMRs for UV5
  drivers/misc/sgi-xp: Adjust references in UV kernel modules
  x86/platform/uv: Remove SCIR MMR references for UV systems
  x86/platform/uv: Remove UV BAU TLB Shootdown Handler
  x86/uv/time: Use a flexible array in struct uv_rtc_timer_head
…kernel/git/tip/tip

Pull x86 PASID updates from Borislav Petkov:
 "Initial support for sharing virtual addresses between the CPU and
  devices which doesn't need pinning of pages for DMA anymore.

  Add support for the command submission to devices using new x86
  instructions like ENQCMD{,S} and MOVDIR64B. In addition, add support
  for process address space identifiers (PASIDs) which are referenced by
  those command submission instructions along with the handling of the
  PASID state on context switch as another extended state.

  Work by Fenghua Yu, Ashok Raj, Yu-cheng Yu and Dave Jiang"

* tag 'x86_pasid_for_5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Add an enqcmds() wrapper for the ENQCMDS instruction
  x86/asm: Carve out a generic movdir64b() helper for general usage
  x86/mmu: Allocate/free a PASID
  x86/cpufeatures: Mark ENQCMD as disabled when configured out
  mm: Add a pasid member to struct mm_struct
  x86/msr-index: Define an IA32_PASID MSR
  x86/fpu/xstate: Add supervisor PASID state for ENQCMD
  x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions
  Documentation/x86: Add documentation for SVA (Shared Virtual Addressing)
  iommu/vt-d: Change flags type to unsigned int in binding mm
  drm, iommu: Change type of pasid to u32
…kernel/git/tip/tip

Pull misc x86 fixes fromm Borislav Petkov:

 - Ratelimit the message about writes to unrecognized MSRs so that they
   don't spam the console log (Chris Down)

 - Document how the /proc/cpuinfo machinery works for future reference
   (Kyung Min Park, Ricardo Neri and Dave Hansen)

 - Correct the current NMI's duration calculation (Libing Zhou)

* tag 'x86_misc_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/nmi: Fix nmi_handle() duration miscalculation
  Documentation/x86: Add documentation for /proc/cpuinfo feature flags
  x86/msr: Make source of unrecognised MSR writes unambiguous
  x86/msr: Prevent userspace MSR access from dominating the console
…nux/kernel/git/tip/tip

Pull x86 fsgsbase updates from Borislav Petkov:
 "Misc minor cleanups and corrections to the fsgsbase code and
  respective selftests"

* tag 'x86_fsgsbase_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests/x86/fsgsbase: Test PTRACE_PEEKUSER for GSBASE with invalid LDT GS
  selftests/x86/fsgsbase: Reap a forgotten child
  x86/fsgsbase: Replace static_cpu_has() with boot_cpu_has()
  x86/entry/64: Correct the comment over SAVE_AND_SET_GSBASE
…ernel/git/tip/tip

Pull x86 fpu updates from Borislav Petkov:

 - Allow clearcpuid= to accept multiple bits (Arvind Sankar)

 - Move clearcpuid= parameter handling earlier in the boot, away from
   the FPU init code and to a generic location (Mike Hommey)

* tag 'x86_fpu_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Handle FPU-related and clearcpuid command line arguments earlier
  x86/fpu: Allow multiple bits in clearcpuid= parameter
…nux/kernel/git/tip/tip

Pull x86 cleanups from Borislav Petkov:
 "Misc minor cleanups"

* tag 'x86_cleanups_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry: Fix typo in comments for syscall_enter_from_user_mode()
  x86/resctrl: Fix spelling in user-visible warning messages
  x86/entry/64: Do not include inst.h in calling.h
  x86/mpparse: Remove duplicate io_apic.h include
…/kernel/git/tip/tip

Pull x86 cache resource control updates from Borislav Petkov:

 - Misc cleanups to the resctrl code in preparation for the ARM side
   (James Morse)

 - Add support for controlling per-thread memory bandwidth throttling
   delay values on hw which supports it (Fenghua Yu)

* tag 'x86_cache_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Enable user to view thread or core throttling mode
  x86/resctrl: Enumerate per-thread MBA controls
  cacheinfo: Move resctrl's get_cache_id() to the cacheinfo header file
  x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmaps
  x86/resctrl: Merge AMD/Intel parse_bw() calls
  x86/resctrl: Add struct rdt_membw::arch_needs_linear to explain AMD/Intel MBA difference
  x86/resctrl: Use is_closid_match() in more places
  x86/resctrl: Include pid.h
  x86/resctrl: Use container_of() in delayed_work handlers
  x86/resctrl: Fix stale comment
  x86/resctrl: Remove struct rdt_membw::max_delay
  x86/resctrl: Remove unused struct mbm_state::chunks_bw
…kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:
 "A single fix making the error message when the opcode bytes at rIP
  cannot be accessed during an oops, more precise"

* tag 'x86_core_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/dumpstack: Fix misleading instruction pointer error message
@pull pull bot added the ⤵️ pull label Oct 12, 2020
@pull pull bot merged commit 1e6d1d9 into linux-doc:master Oct 12, 2020
pull bot pushed a commit that referenced this pull request Aug 26, 2022
…nsigned fw_level

Though acpi_find_last_cache_level() always returned signed value and the
document states it will return any errors caused by lack of a PPTT table,
it never returned negative values before.

Commit 0c80f9e ("ACPI: PPTT: Leave the table mapped for the runtime usage")
however changed it by returning -ENOENT if no PPTT was found. The value
returned from acpi_find_last_cache_level() is then assigned to unsigned
fw_level.

It will result in the number of cache leaves calculated incorrectly as
a huge value which will then cause the following warning from __alloc_pages
as the order would be great than MAX_ORDER because of incorrect and huge
cache leaves value.

  |  WARNING: CPU: 0 PID: 1 at mm/page_alloc.c:5407 __alloc_pages+0x74/0x314
  |  Modules linked in:
  |  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-10393-g7c2a8d3ac4c0 #73
  |  pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  |  pc : __alloc_pages+0x74/0x314
  |  lr : alloc_pages+0xe8/0x318
  |  Call trace:
  |   __alloc_pages+0x74/0x314
  |   alloc_pages+0xe8/0x318
  |   kmalloc_order_trace+0x68/0x1dc
  |   __kmalloc+0x240/0x338
  |   detect_cache_attributes+0xe0/0x56c
  |   update_siblings_masks+0x38/0x284
  |   store_cpu_topology+0x78/0x84
  |   smp_prepare_cpus+0x48/0x134
  |   kernel_init_freeable+0xc4/0x14c
  |   kernel_init+0x2c/0x1b4
  |   ret_from_fork+0x10/0x20

Fix the same by changing fw_level to be signed integer and return the
error from init_cache_level() early in case of error.

Reported-and-Tested-by: Bruno Goncalves <bgoncalv@redhat.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20220808084640.3165368-1-sudeep.holla@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
pull bot pushed a commit that referenced this pull request Oct 11, 2022
This is a BUG_ON issue as follows when running xfstest-generic-503:
WARNING: CPU: 21 PID: 1385 at fs/f2fs/inode.c:762 f2fs_evict_inode+0x847/0xaa0
Modules linked in:
CPU: 21 PID: 1385 Comm: umount Not tainted 5.19.0-rc5+ #73
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014

Call Trace:
evict+0x129/0x2d0
dispose_list+0x4f/0xb0
evict_inodes+0x204/0x230
generic_shutdown_super+0x5b/0x1e0
kill_block_super+0x29/0x80
kill_f2fs_super+0xe6/0x140
deactivate_locked_super+0x44/0xc0
deactivate_super+0x79/0x90
cleanup_mnt+0x114/0x1a0
__cleanup_mnt+0x16/0x20
task_work_run+0x98/0x100
exit_to_user_mode_prepare+0x3d0/0x3e0
syscall_exit_to_user_mode+0x12/0x30
do_syscall_64+0x42/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0

Function flow analysis when BUG occurs:
f2fs_fallocate                    mmap
                                  do_page_fault
                                    pte_spinlock  // ---lock_pte
                                    do_wp_page
                                      wp_page_shared
                                        pte_unmap_unlock   // unlock_pte
                                          do_page_mkwrite
                                          f2fs_vm_page_mkwrite
                                            down_read(invalidate_lock)
                                            lock_page
                                            if (PageMappedToDisk(page))
                                              goto out;
                                            // set_page_dirty  --NOT RUN
                                            out: up_read(invalidate_lock);
                                        finish_mkwrite_fault // unlock_pte
f2fs_collapse_range
  down_write(i_mmap_sem)
  truncate_pagecache
    unmap_mapping_pages
      i_mmap_lock_write // down_write(i_mmap_rwsem)
        ......
        zap_pte_range
          pte_offset_map_lock // ---lock_pte
           set_page_dirty
            f2fs_dirty_data_folio
              if (!folio_test_dirty(folio)) {
                                        fault_dirty_shared_page
                                          set_page_dirty
                                            f2fs_dirty_data_folio
                                              if (!folio_test_dirty(folio)) {
                                                filemap_dirty_folio
                                                f2fs_update_dirty_folio // ++
                                              }
                                            unlock_page
                filemap_dirty_folio
                f2fs_update_dirty_folio // page count++
              }
          pte_unmap_unlock  // --unlock_pte
      i_mmap_unlock_write  // up_write(i_mmap_rwsem)
  truncate_inode_pages
  up_write(i_mmap_sem)

When race happens between mmap-do_page_fault-wp_page_shared and
fallocate-truncate_pagecache-zap_pte_range, the zap_pte_range calls
function set_page_dirty without page lock. Besides, though
truncate_pagecache has immap and pte lock, wp_page_shared calls
fault_dirty_shared_page without any. In this case, two threads race
in f2fs_dirty_data_folio function. Page is set to dirty only ONCE,
but the count is added TWICE by calling filemap_dirty_folio.
Thus the count of dirty page cannot accord with the real dirty pages.

Following is the solution to in case of race happens without any lock.
Since folio_test_set_dirty in filemap_dirty_folio is atomic, judge return
value will not be at risk of race.

Signed-off-by: Shuqi Zhang <zhangshuqi3@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
pull bot pushed a commit that referenced this pull request Nov 10, 2023
This command:

$ perf record -e cycles:k -e instructions:k -c 10000 -m 64M dd if=/dev/zero of=/dev/null count=1000

gives rise to this kernel warning:

[  444.364395] WARNING: CPU: 0 PID: 104 at kernel/smp.c:775 smp_call_function_many_cond+0x42c/0x436
[  444.364515] Modules linked in:
[  444.364657] CPU: 0 PID: 104 Comm: perf-exec Not tainted 6.6.0-rc6-00051-g391df82e8ec3-dirty #73
[  444.364771] Hardware name: riscv-virtio,qemu (DT)
[  444.364868] epc : smp_call_function_many_cond+0x42c/0x436
[  444.364917]  ra : on_each_cpu_cond_mask+0x20/0x32
[  444.364948] epc : ffffffff8009f9e0 ra : ffffffff8009fa5a sp : ff20000000003800
[  444.364966]  gp : ffffffff81500aa0 tp : ff60000002b83000 t0 : ff200000000038c0
[  444.364982]  t1 : ffffffff815021f0 t2 : 000000000000001f s0 : ff200000000038b0
[  444.364998]  s1 : ff60000002c54d98 a0 : ff60000002a73940 a1 : 0000000000000000
[  444.365013]  a2 : 0000000000000000 a3 : 0000000000000003 a4 : 0000000000000100
[  444.365029]  a5 : 0000000000010100 a6 : 0000000000f00000 a7 : 0000000000000000
[  444.365044]  s2 : 0000000000000000 s3 : ffffffffffffffff s4 : ff60000002c54d98
[  444.365060]  s5 : ffffffff81539610 s6 : ffffffff80c20c48 s7 : 0000000000000000
[  444.365075]  s8 : 0000000000000000 s9 : 0000000000000001 s10: 0000000000000001
[  444.365090]  s11: ffffffff80099394 t3 : 0000000000000003 t4 : 00000000eac0c6e6
[  444.365104]  t5 : 0000000400000000 t6 : ff60000002e010d0
[  444.365120] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
[  444.365226] [<ffffffff8009f9e0>] smp_call_function_many_cond+0x42c/0x436
[  444.365295] [<ffffffff8009fa5a>] on_each_cpu_cond_mask+0x20/0x32
[  444.365311] [<ffffffff806e90dc>] pmu_sbi_ctr_start+0x7a/0xaa
[  444.365327] [<ffffffff806e880c>] riscv_pmu_start+0x48/0x66
[  444.365339] [<ffffffff8012111a>] perf_adjust_freq_unthr_context+0x196/0x1ac
[  444.365356] [<ffffffff801237aa>] perf_event_task_tick+0x78/0x8c
[  444.365368] [<ffffffff8003faf4>] scheduler_tick+0xe6/0x25e
[  444.365383] [<ffffffff8008a042>] update_process_times+0x80/0x96
[  444.365398] [<ffffffff800991ec>] tick_sched_handle+0x26/0x52
[  444.365410] [<ffffffff800993e4>] tick_sched_timer+0x50/0x98
[  444.365422] [<ffffffff8008a6aa>] __hrtimer_run_queues+0x126/0x18a
[  444.365433] [<ffffffff8008b350>] hrtimer_interrupt+0xce/0x1da
[  444.365444] [<ffffffff806cdc60>] riscv_timer_interrupt+0x30/0x3a
[  444.365457] [<ffffffff8006afa6>] handle_percpu_devid_irq+0x80/0x114
[  444.365470] [<ffffffff80065b82>] generic_handle_domain_irq+0x1c/0x2a
[  444.365483] [<ffffffff8045faec>] riscv_intc_irq+0x2e/0x46
[  444.365497] [<ffffffff808a9c62>] handle_riscv_irq+0x4a/0x74
[  444.365521] [<ffffffff808aa760>] do_irq+0x7c/0x7e
[  444.365796] ---[ end trace 0000000000000000 ]---

That's because the fix in commit 3fec323 ("drivers: perf: Fix panic
in riscv SBI mmap support") was wrong since there is no need to broadcast
to other cpus when starting a counter, that's only needed in mmap when
the counters could have already been started on other cpus, so simply
remove this broadcast.

Fixes: 3fec323 ("drivers: perf: Fix panic in riscv SBI mmap support")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Clément Léger <cleger@rivosinc.com>
Tested-by: Yu Chien Peter Lin <peterlin@andestech.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> #On
Link: https://lore.kernel.org/r/20231026084010.11888-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
pull bot pushed a commit that referenced this pull request Jan 27, 2025
If a timerlat tracer is started with the osnoise option OSNOISE_WORKLOAD
disabled, but then that option is enabled and timerlat is removed, the
tracepoints that were enabled on timerlat registration do not get
disabled. If the option is disabled again and timelat is started, then it
triggers a warning in the tracepoint code due to registering the
tracepoint again without ever disabling it.

Do not use the same user space defined options to know to disable the
tracepoints when timerlat is removed. Instead, set a global flag when it
is enabled and use that flag to know to disable the events.

 ~# echo NO_OSNOISE_WORKLOAD > /sys/kernel/tracing/osnoise/options
 ~# echo timerlat > /sys/kernel/tracing/current_tracer
 ~# echo OSNOISE_WORKLOAD > /sys/kernel/tracing/osnoise/options
 ~# echo nop > /sys/kernel/tracing/current_tracer
 ~# echo NO_OSNOISE_WORKLOAD > /sys/kernel/tracing/osnoise/options
 ~# echo timerlat > /sys/kernel/tracing/current_tracer

Triggers:

 ------------[ cut here ]------------
 WARNING: CPU: 6 PID: 1337 at kernel/tracepoint.c:294 tracepoint_add_func+0x3b6/0x3f0
 Modules linked in:
 CPU: 6 UID: 0 PID: 1337 Comm: rtla Not tainted 6.13.0-rc4-test-00018-ga867c441128e-dirty #73
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
 RIP: 0010:tracepoint_add_func+0x3b6/0x3f0
 Code: 48 8b 53 28 48 8b 73 20 4c 89 04 24 e8 23 59 11 00 4c 8b 04 24 e9 36 fe ff ff 0f 0b b8 ea ff ff ff 45 84 e4 0f 84 68 fe ff ff <0f> 0b e9 61 fe ff ff 48 8b 7b 18 48 85 ff 0f 84 4f ff ff ff 49 8b
 RSP: 0018:ffffb9b003a87ca0 EFLAGS: 00010202
 RAX: 00000000ffffffef RBX: ffffffff92f30860 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff9bf59e91ccd0 RDI: ffffffff913b6410
 RBP: 000000000000000a R08: 00000000000005c7 R09: 0000000000000002
 R10: ffffb9b003a87ce0 R11: 0000000000000002 R12: 0000000000000001
 R13: ffffb9b003a87ce0 R14: ffffffffffffffef R15: 0000000000000008
 FS:  00007fce81209240(0000) GS:ffff9bf6fdd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055e99b728000 CR3: 00000001277c0002 CR4: 0000000000172ef0
 Call Trace:
  <TASK>
  ? __warn.cold+0xb7/0x14d
  ? tracepoint_add_func+0x3b6/0x3f0
  ? report_bug+0xea/0x170
  ? handle_bug+0x58/0x90
  ? exc_invalid_op+0x17/0x70
  ? asm_exc_invalid_op+0x1a/0x20
  ? __pfx_trace_sched_migrate_callback+0x10/0x10
  ? tracepoint_add_func+0x3b6/0x3f0
  ? __pfx_trace_sched_migrate_callback+0x10/0x10
  ? __pfx_trace_sched_migrate_callback+0x10/0x10
  tracepoint_probe_register+0x78/0xb0
  ? __pfx_trace_sched_migrate_callback+0x10/0x10
  osnoise_workload_start+0x2b5/0x370
  timerlat_tracer_init+0x76/0x1b0
  tracing_set_tracer+0x244/0x400
  tracing_set_trace_write+0xa0/0xe0
  vfs_write+0xfc/0x570
  ? do_sys_openat2+0x9c/0xe0
  ksys_write+0x72/0xf0
  do_syscall_64+0x79/0x1c0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250123204159.4450c88e@gandalf.local.home
Fixes: e88ed22 ("tracing/timerlat: Add user-space interface")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
pull bot pushed a commit that referenced this pull request Jan 25, 2026
Revert commit bfc467d ("serial: remove redundant
tty_port_link_device()") because the tty_port_link_device() is not
redundant: the tty->port has to be confured before we call
uart_configure_port(), otherwise user-space can open console without TTY
linked to the driver.

This tty_port_link_device() was added explicitly to avoid this exact
issue in commit fb2b900 ("tty: link tty and port before configuring
it as console"), so offending commit basically reverted the fix saying
it is redundant without addressing the actual race condition presented
there.

Reproducible always as tty->port warning on Qualcomm SoC with most of
devices disabled, so with very fast boot, and one serial device being
the console:

  printk: legacy console [ttyMSM0] enabled
  printk: legacy console [ttyMSM0] enabled
  printk: legacy bootconsole [qcom_geni0] disabled
  printk: legacy bootconsole [qcom_geni0] disabled
  ------------[ cut here ]------------
  tty_init_dev: ttyMSM driver does not set tty->port. This would crash the kernel. Fix the driver!
  WARNING: drivers/tty/tty_io.c:1414 at tty_init_dev.part.0+0x228/0x25c, CPU#2: systemd/1
  Modules linked in: socinfo tcsrcc_eliza gcc_eliza sm3_ce fuse ipv6
  CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G S                  6.19.0-rc4-next-20260108-00024-g2202f4d30aa8 #73 PREEMPT
  Tainted: [S]=CPU_OUT_OF_SPEC
  Hardware name: Qualcomm Technologies, Inc. Eliza (DT)
  ...
  tty_init_dev.part.0 (drivers/tty/tty_io.c:1414 (discriminator 11)) (P)
  tty_open (arch/arm64/include/asm/atomic_ll_sc.h:95 (discriminator 3) drivers/tty/tty_io.c:2073 (discriminator 3) drivers/tty/tty_io.c:2120 (discriminator 3))
  chrdev_open (fs/char_dev.c:411)
  do_dentry_open (fs/open.c:962)
  vfs_open (fs/open.c:1094)
  do_open (fs/namei.c:4634)
  path_openat (fs/namei.c:4793)
  do_filp_open (fs/namei.c:4820)
  do_sys_openat2 (fs/open.c:1391 (discriminator 3))
  ...
  Starting Network Name Resolution...

Apparently the flow with this small Yocto-based ramdisk user-space is:

driver (qcom_geni_serial.c):                  user-space:
============================                  ===========
qcom_geni_serial_probe()
 uart_add_one_port()
  serial_core_register_port()
   serial_core_add_one_port()
    uart_configure_port()
     register_console()
    |
    |                                         open console
    |                                          ...
    |                                          tty_init_dev()
    |                                           driver->ports[idx] is NULL
    |
    tty_port_register_device_attr_serdev()
     tty_port_link_device() <- set driver->ports[idx]

Fixes: bfc467d ("serial: remove redundant tty_port_link_device()")
Cc: stable@vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://patch.msgid.link/20260123072139.53293-2-krzysztof.kozlowski@oss.qualcomm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.